Studies with Fabricated Switchboard Data: Exploring Sources of Model-data Mismatch

نویسندگان

  • Don McAllaster
  • Larry Gillick
  • Francesco Scattone
  • Mike Newman
چکیده

We present a study of data simulated using acoustic models trained on Switchboard data, and then recognized using various Switchboard-trained acoustic models. The Switchboard-trained models yield word error rates of about 47 percent, on real Switchboard conversations. When data is simulated using the acoustic models, but in a way that insures that the pronunciations in our recognition dictionary are “perfect”, the WER drops by nearly a factor of five. If instead we use hand-labeled phonetic transcriptions to fabricate data that more realistically represents the way words are pronounced – rendering our recognition pronunciations imperfect – we obtain WERs in the low 40’s, rates that are fairly similar to those seen in actual speech data. Taken as a whole, these and other experiments we describe in the paper suggest that there is a substantial mismatch between real speech data and our speech models. The use of simulation in speech recognition research appears to be a promising tool in our efforts to understand and reduce the size of this mismatch.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fabricating conversational speech data with acoustic models: a program to examine model-data mismatch

We present a study of data simulated using acoustic models trained on Switchboard data, and then recognized using various Switchboard-trained acoustic models. When we recognize real Switchboard conversations, simple development models give a word error rate (WER) of about 47 percent. If instead we simulate the speech data using word transcriptions of the conversation, obtaining the pronunciatio...

متن کامل

Exploring Health System Responsiveness in Ambulatory Care and Disease Management and its Relation to Other Dimensions of Health System Performance (RAC) – Study Design and Methodology

Background The responsiveness of a health system is considered to be an intrinsic goal of  health systems and an essential aspect in performance assessment. Numerous studies have analysed health system responsiveness and related concepts, especially across different countries and health systems. However, fewer studies have applied the concept for the evaluation of specific healthcare delivery s...

متن کامل

Resegmentation of SWITCHBOARD

The SWITCHBOARD (SWB) corpus is one of the most important benchmarks for recognition tasks involving large vocabulary conversational speech (LVCSR). The high error rates on SWB are largely attributable to an acoustic model mismatch, the high frequency of poorly articulated monosyllabic words, and large variations in pronunciations. It is imperative to improve the quality of segmentations and tr...

متن کامل

فرا تحلیل پژوهش‌های حوزه علم‌سنجی بر اساس شیوع استفاده از پایگاه‌های اطلاعات علمی (موردمطالعه: پژوهش‌های داخلی)

: Scientometric research with the ability to assess scientific research and using multiple indicators in explaining capacities, scientific performance and technology in different dimensions has increased the attention of researchers.This study aims to meta-analysis the Iranian researches in scientometrics from the perspective of uses of scientific databases as data sources in this research fiel...

متن کامل

Acoustic Model Identification Using Inverse Model

Sound measured at various points around the environment can be evaluated by a series of multi-pole sources and their acoustic strength can be acquired. In this numerical study, a method, called the inverse method, was examined to achieve this goal. A variety of arrangements of different sources were considered and the acoustic strength of these sources was acquired. Through the application of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998